.. _Tutorial: using NeurEco python API on a Tabular Classification problem:

Tutorial: using NeurEco python API on a Tabular Classification problem
========================================================================

The following section uses the test case :std:ref:`Gene expression cancer rna sequence test case`. This test case is included in the NeurEco installation package.

Create an empty directory (geneCancer Example), extract the :std:ref:`Gene expression cancer rna sequence test case` test case data there. The created directory contains the following files:

* x_test.csv
* y_test.csv
* x_train_0.csv
* y_train_0.csv
* x_train_1.csv
* y_train_1.csv

Build a model
-----------------

* Import the required libraries (NeurEco and NumPy):

.. code-block:: python

    from NeurEco import NeurEcoTabular as Tabular 
    import numpy as np 

* Load the training data:

.. code-block:: python

    x_train = []
    y_train = []
    for i in range(2):
        x_name = "x_train_" + str(i) + "_.csv"
        y_name = "y_train_" + str(i) + "_.csv"
        x_part = np.genfromtxt(x_name, delimiter=";", skip_header=True)
        x_train.append(x_part)
        y_part = np.genfromtxt(y_name, delimiter=";", skip_header=True)
        y_train.append(y_part)
    x_train = np.vstack(tuple(x_train))
    y_train = np.vstack(tuple(y_train))

* Initialize a NeurEco object to handle the **Classification** problem:

.. code-block:: python

    builder = Tabular.Classifier()

All the methods provided by the **Classifier** class, can be viewed by calling the *__method__* attributes:

.. code-block:: python

    print(builder.__methods__)

.. code-block:: text
    
    *** NeurEco Tabular Classifier methods: ***
    - load
    - save
    - delete
    - evaluate
    - build
    - get_input_count
    - get_output_count
    - load_model_from_checkpoint
    - get_number_of_networks_from_checkpoint
    - get_weights
    - export_fmu
    - export_c
    - export_onnx
    - export_vba
    - compute_error
    - plot_network
    - forward_derivative
    - gradient
    - set_weights
    - perform_input_sweep

To understand what each parameter of any method does and how to use it print the doc of the method:

.. code-block:: python

    print(builder.export_c.__doc__)

.. code-block:: text

    exports a NeurEco tabular model to a header file
    :param h_file_path: path where the .h file will be saved
    :param precision: string: optional: "float" or "double": precision of the weights in the h file
    :return: export_status: int: 0 if export is ok, other if otherwise.

* To build the model, run the **build** method with the building parameters adjusted to the problem at hand (see :std:ref:`Build NeurEco Classification model with the Python API`). For this example, the outputs to be normalized per feature (meaning that each output will be normalized apart, it is the default setting for **Compression**, see :std:ref:`Normalizing the data Tabular Compression`):

.. code-block:: python

    builder.build(input_data=x_train, output_data=y_train,
              # the rest of these parameters are optional
              write_model_to="./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.ednn",
              checkpoint_address="./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.checkpoint",
              valid_percentage=33.33)

* When **build** is called, NeurEco starts the building process:

.. code-block:: text

    Validation Percentage will be used to get the validation data. This is due to:
    - one or all the validation data is set to None
    - validation indices is set to None

     info > 
     info >      _   __                ______     
     info >     / | / /__  __  _______/ ____/________ 
     info >    /  |/ / _ \/ / / / ___/ __/ / ___/ __ \
     info >   / /|  /  __/ /_/ / /  / /___/ /__/ /_/ /
     info >  /_/ |_/\___/\__,_/_/  /_____/\___/\____/ 
     info >                  === A D A G O S ===
     info > 
     info > Version: 4.01.2474.0 Compiled with MSVC v1928  Oct 12 2022 Matlab runtime:no
     info > OpenMP: yes
     info > MKL: yes
     info > Reading data files...
     info > Reading Data from C:/Users/Sadok/AppData/Local/Temp/tmplluno8ip/inputs_tab_train.npy
     info > Reading Data from C:/Users/Sadok/AppData/Local/Temp/tmplluno8ip/outputs_tab_train.npy
     info > build for: 5 outputs and 20531 inputs and 640 samples.
     info > Preparing Inputs 
     info > Building Model

During the build NeurEco saves the intermediate modes to the checkpoint file (defined by the parameter **checkpoint_address**). 
To load and use the intermediate models from this checkpoint: 

* Create a new NeurEco object in which to load the model:

.. code-block:: python

    model = Tabular.Classifier()

* Determine how many intermediate models the checkpoint contains:

.. code-block:: python

    n = model.get_number_of_networks_from_checkpoint("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.checkpoint")

* Load any intermediate model from the checkpoint using its id (count starts with zero). For this example, at the moment of running the command :math:`n=6` and the following command loads the intermediate model :math:`n°3 \ (id=2)`:

.. code-block:: python

    model.load_model_from_checkpoint("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.checkpoint", 2)

Now **model** is a valid **Compression** model, and can be used as usual. 

* Check the number of trainable parameters each of the intermediate models has:

.. code-block:: python

    for i in range(n):
        print("Loading model", i, " from checkpoint file:")
        model.load_model_from_checkpoint("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.checkpoint", i)
        print("number of trainable parameters in intermediate model --", i, " is:", model.get_weights().size)

.. code-block:: text

    Loading model 0  from checkpoint file:
    number of trainable parameters in intermediate model -- 0  is: 157
    Loading model 1  from checkpoint file:
    number of trainable parameters in intermediate model -- 1  is: 157
    Loading model 2  from checkpoint file:
    number of trainable parameters in intermediate model -- 2  is: 148
    Loading model 3  from checkpoint file:
    number of trainable parameters in intermediate model -- 3  is: 148
    Loading model 4  from checkpoint file:
    number of trainable parameters in intermediate model -- 4  is: 148
    Loading model 5  from checkpoint file:
    number of trainable parameters in intermediate model -- 5  is: 148

Evaluate a model
-------------------

* Load the testing data from the CSV files:

.. code-block:: python

    x_test = np.genfromtxt("x_test.csv", delimiter=";", skip_header=True)
    y_test = np.genfromtxt("y_test.csv", delimiter=";", skip_header=True)

* Create a **Classifier** object to use for the evaluation:

.. code-block:: python

    evaluator = Tabular.Classifier()

.. note::
    It is possible to use the already existing **Classifier** object **builder** when the evaluation is done just after the **build**, and **builder** is still available.

* Load the built model:

.. code-block:: python

    load_state = evaluator.load("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq")

.. note::
    When building or evaluating a NeurEco model, all the used paths don’t necessarily need to have an extension when it is passed as a parameter to a NeurEco method, being for a model or for a checkpoint file.

* To extract information from the loaded model, such as the number of inputs, the number of outputs and the weights array, run:

.. code-block:: python

    n_inputs = evaluator.get_input_count()
    n_outputs = evaluator.get_output_count()
    weights = evaluator.get_weights()
    print("Number of Inputs:", n_inputs)
    print("Number of Outputs:", n_outputs)
    print("Number of trainable parameters:", weights.size)


.. code-block:: text

    Number of Inputs: 20531
    Number of Outputs: 5
    Number of trainable parameters: 148

* To plot the network graph (this operation requires *matplotlib* library installed, see :std:ref:`Plot a NeurEco network Classification Python API`):

.. code-block:: python

    evaluator.plot_network()

.. figure:: ./images/GeneCancerPythonNetworkPlot.png
    :width: 800
    :alt: GeneCancerPythonNetworkPlot
    :align: center

    Python API operations: plotting a network: test case - GeneCancer 


* To evaluate the model on the test data:

.. code-block:: python

    neureco_outputs = evaluator.evaluate(x_test)
    l2_error = evaluator.compute_error(neureco_outputs, y_test)
    print("L2 relative error (%):", 100 * l2_error)

.. code-block:: text

    L2 relative error (%): 0.0

.. note::
  During evaluation, the normalization is carried out by the model and its parameters are not relative to the data set being evaluated, but are the global parameters computed during the **build** of the model.

* To save the model in the native NeurEco binary format:

.. code-block:: python

    save_state = evaluator.save("GeneExpressionCancerRnaSeqModel//NewDir/SameModel")

* To export the model, run one of the following commands (*embed* license is required):

.. code-block:: python

    save_state = evaluator.save("GeneExpressionCancerRnaSeqModel//NewDir/SameModel")
    evaluator.export_c("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.h", precision="float")
    evaluator.export_onnx("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.onnx", precision="float")
    evaluator.export_fmu("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.fmu")
    evaluator.export_vba("./GeneExpressionCancerRnaSeqModel/GeneExpressionCancerRnaSeq.bas")

.. warning::
  Once the NeurEco object is no longer needed, free the memory by deleting the object by calling the **delete** method. For the example above, three objects must be deleted:

  .. code-block:: python

    builder.delete()
    evaluator.delete()
    model.delete()
